CROCUS: Cluster-based Ontology Data Cleansing

نویسندگان

  • Didier Cherix
  • Ricardo Usbeck
  • Andreas Both
  • Jens Lehmann
چکیده

Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which prevents their direct use in productive systems. Hence, (semi-)automatic quality assurance processes are needed as manual ontology repair procedures by domain experts are expensive and time consuming. In this article, we present CROCUS – a pipeline for cluster-based ontology data cleansing. Our system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs. CROCUS was evaluated on two datasets. The experiments show that we are able to detect errors with high recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lessons Learned - The Case of CROCUS: Cluster-Based Ontology Data Cleansing

Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which preven...

متن کامل

Joint proceedings of Second International Workshop on Semantic Web Enterprise Adoption and Best Practice ( WaSABi 2014 ) & Second International Workshop on Finance and Economics

Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which preven...

متن کامل

A Fuzzy Ontology-Based Platform for Flexible Querying

Flexible queries have recently received increasing attention to better characterize the data retrieval. In this paper, a new flexible querying approach using ontological knowledge is proposed. This approach presents an FCA based methodology for building ontologies from scratch then interrogating them intelligently through the fusion of conceptual clustering, fuzzy logic, and FCA. The main contr...

متن کامل

Medicinal plants and food medicines in the folk traditions of the upper Lucca Province, Italy.

An ethnopharmacobotanical survey of the medicinal plants and food medicines of the northern part of Lucca Province, north-west Tuscany, central Italy, was carried out. The geographical isolation of this area has permitted the survival of a rich folk phytotherapy involving medicinal herbs and also vegetable resources used by locals as food medicine. Among these are the uncommon use of Ballota ni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014